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Stimuli associated with primary reinforcers appear themselves to acquire the capacity to strengthen 
behavior. This paper reviews research on the strengthening effects of conditioned reinforcers within the 
context of contemporary quantitative choice theories and behavioral momentum theory. Based partially 
on the finding that variations in parameters of conditioned reinforcement appear not to affect response 
strength as measured by resistance to change, long-standing assertions that conditioned reinforcers do 
not strengthen behavior in a reinforcement-like fashion are considered. A signposts or means-to-an-end 
account is explored and appears to provide a plausible alternative interpretation of the effects of stimuli 
associated with primary reinforcers. Related suggestions that primary reinforcers also might not have 
their effects via a strengthening process are explored and found to be worthy of serious consideration. 
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An influential review of conditioned rein- 
forcement 15 years ago (Williams, 1994a) 
opened with a quote from more than 40 years 
ago lamenting the fact that there may be no 
other concept in psychology in such a state of 
theoretical disarray (Bolles, 1967). Unfortu- 
nately, Bolles’ evaluation of conditioned rein- 
forcement appears relevant despite 40 addi- 
tional years of work on the topic. Given the 
reams of published material and long-running 
controversies surrounding the concept of 
conditioned reinforcement, this review will 
not attempt to be exhaustive. For more 
thorough reviews, the interested reader should 
consider other sources (e.g., Fantino, 1977; 
Hendry, 1969; Nevin, 1973; Wike, 1966; 
Williams, 1994a, b). Also, when considering a 
concept like conditioned reinforcement with 
such a long and storied past, it is difficult to say 
anything that has not been said before. Thus, 
much of what follows will not be new. Rather, I 
will briefly review the contemporary approach 
to studying the strengthening effects of condi- 
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Honed reinforcers and then consider how 
recent research has led me to reexamine an 
old question about the nature of conditioned 
reinforcement: Do conditioned reinforcers 
actually strengthen behavior upon which they 
are contingent? 

To begin, it may be helpful to consider a 
couple of definitions of conditioned reinforce- 
ment from recent textbooks. 

“A previously neutral stimulus that has ac- 
quired the capacity to strengthen responses 
because it has been repeatedly paired with a 
primary reinforcer” (Mazur, 2006). 

‘‘A stimulus that has acquired the capacity to 
reinforce behavior through its association with 
a primary reinforcer” (Bouton, 2007). 

As these definitions suggest, neutral stimuli 
seem to acquire the capacity to function as 
reinforcers as a result of their relationship with 
a primary reinforcer. This acquired capacity to 
strengthen responding is generally considered 
to be the outcome of Pavlovian conditioning 
(e.g., Mackintosh, 1974; Williams, 1994b). 
Thus, the same principles that result in a 
stimulus acquiring the capacity to function as a 
conditioned stimulus when predictive of an 
unconditioned stimulus seem to result in a 
neutral stimulus acquiring the capacity to 
function as a reinforcer when predictive of a 
primary reinforcer. 

Evidence for such acquired strengthening 
effects traditionally came from tests to see if 
the stimulus could result in the acquisition of a 
new response or change the rate or pattern of 
a response under maintenance or extinction 
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conditions (see Kelleher & Gollub, 1962 for 
review) . 

Somewhat later work showed that stimuli 
temporally proximate to primary reinforce- 
ment could change the rate and pattern of 
behavior upon which they are contingent in 
chain schedules of reinforcement (see Gollub, 
1977, for review). In tbat sense, such stimuli 
may be considered reinforcers. But, as will be 
discussed later, interpreting such changes in 
rates or patterns of behavior in terms of a 
strengthening process has been controversial 
for a long time. Before returning to a discussion 
of whether conditioned reinforcers actually 
strengthen responding, I first examine the 
contemporary approach to measuring strength- 
ening effects of conditioned reinforcers within 
the context of matching-law based choice 
theories and then discuss a more limited body 
of research examining the strengthening ef- 
fects of conditioned reinforcers within the 
context of behavioral momentum theory. 



Fig. 1. Schematic of a concurrent-chains procedure 
(B=Blue, G=Green, R=Red). Dark keys are inoperative. 
See text for details. 


Theories of Choice and Relative Strength of 
Conditioned Reinforcement 

Herrnstein (1961) found that with concur- 
rent sources of primary reinforcement avail- 
able for two operant responses, the relative 
rate of responding to the options was directly 
related to the relative rate of reinforcement 
obtained from the options. Quantitatively the 
matching law states that: 


B\ -|- If) R\ -|- Ri 

where and B 2 refer to the rates of 
responding to the two options and Ri and R 2 
refer to the obtained reinforcement rates from 
those options. With the introduction of 
Equation 1 and its extensions (e.g., Herrn- 
stein, 1970), relative response strength as 
measured by relative allocation of behavior 
became a major theoretical foundation of the 
experimental analysis of behavior. 

The insights provided by the matching law 
were extended to characterize the relative 
strengthening effects of conditioned reinforc- 
ers using concurrent-chains procedures. Fig- 
ure 1 shows a schematic of a concurrent-chains 
procedure. In the typical arrangement, two 
concurrently available initial-link schedules 
produce transitions to mutually exclusive 
terminal-link schedules signaled by different 


stimuli. For example, in Figure 1, responding 
to the concurrently available variable-interval 
(VI) 120-s schedules in the presence of blue 
keys produces transitions to terminal links 
associated with different VI schedules in the 
presence of either red or green keys. The 
allocation of responding in the initial links is 
presumably, at least in part, a reflection of the 
relative conditioned reinforcing effects of the 
terminal-link stimuli. 

Early work with concurrent chains suggested 
that the relative rate of responding in the 
initial links matched the relative rates of 
primary reinforcement delivered in the termi- 
nal links as described by Equation 1 (Autor, 
1969; Herrnstein, 1964). This outcome led 
Herrnstein to conclude that the strengthening 
effects of a conditioned reinforcer are a 
function of the rate of primary reinforcement 
obtained in its presence. However, Fantino 
(1969) soon showed that preference for a 
terminal link associated with a higher rate of 
primary reinforcement decreased as both 
initial links were increased. A simple applica- 
tion of Equation 1 does not predict this 
outcome because the relative rate of reinforce- 
ment in the two terminal links remains 
unchanged. Thus, the strengthening effects 
of a conditioned reinforcer are clearly not just 
a function of the rate of primary reinforce- 
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merit obtained in its presence. To account for 
these findings, Fantino (1969) proposed the 
following extension of the matching law 
known as delay reduction theory (DRT) : 

( 2 ) 

Bi+B^ {T — 1\) + {T — t-i) 

where Bi and B 2 are response rates to the two 
options in the initial links, T is the average delay 
to primary reinforcement from the onset of the 
initial links, and b and t 2 are the average delays 
to primary reinforcement from the onset of 
each of the terminal link stimuli. DRT provides 
a quantitative theory of conditioned reinforce- 
ment suggesting that the value or strengthening 
effects of a stimulus depend upon the average 
reduction in expected time to reinforcement 
signaled by the onset of that stimulus. As noted 
by Williams (1988), DRT is similar to compar- 
ator theories of Pavlovian conditioning (i.e.. 
Scalar Expectancy Theory; Gibbon & Balsam, 
1981) that predict the related finding that 
conditioning depends on duration of a CS 
relative to the overall time between US presen- 
tations. Such a correspondence with findings 
and theorizing in Pavlovian conditioning is 
obviously good given the assumption noted 
above that the acquired strengthening capacity 
of conditioned reinforcers is the result of a 
Pavlovian conditioning process. 

One well-known complication of using 
chain schedules to study conditioned rein- 
forcement is that there is a dependency 
between responding in an initial link and 
ultimate access to the primary reinforcer (e.g., 
Branch, 1983; Dinsmoor, 1983; Williams, 
1994b). Thus, responding in the initial links 
of chain schedules likely reflects the strength- 
ening effects of both primary reinforcement in 
the terminal links and any conditioned rein- 
forcing effects of the terminal link stimuli. 
This additional dependency is especially rele- 
vant when initial links of different durations 
are arranged in concurrent-chains schedules 
(i.e., when different rates of conditioned 
reinforcement are arranged). To accommo- 
date the impact of changes in primary rein- 
forcement associated with differential initial- 
link durations. Squires and Fantino (1971) 
modified DRT such that: 

B\ R\{T— 1\) 

B,+B2~ Ri{T-h)+R2{T-t2) 


where all the terms are as in Equation 2, and 
Rj and R 2 refer to the overall rates of primary 
reinforcement associated with the two options. 
In the absence of terminal links and their 
putative conditioned reinforcing effects. Equa- 
tion 3 reduces to the strict matching law (i.e.. 
Equation 1). DRT thus became a general 
theory of choice incorporating the effects of 
both primary and conditioned reinforcement. 

The basic approach of DRT inspired a 
number of additional general choice theories 
incorporating the strengthening effects of 
both primary and conditioned reinforcers. 
Examples of such theories include incentive 
theory (Killeen, 1982), melioration theory 
(Vaughan, 1985), the contextual choice model 
(CCM; Grace, 1994), and the hyperbolic value- 
added model (HVA; Mazur, 2001). A consid- 
eration of the relative merits of all these 
models is beyond the scope of the present 
paper, but CCM and HVA will be briefly 
reviewed in order to explore some relevant 
issues about conditioned reinforcement and 
response strength. 

Both CCM and HVA are based on the 
concatenated generalized matching law. The 
generalized matching law accounts for com- 
mon deviations (i.e., bias, under- or over- 
matching) from the strict matching law pre- 
sented in Equation 1 (Baum, 1974) and states 
that: 


Ih 

B2 



( 4 ) 


where the terms are as in Equation 1 and the 
parameters a and b reflect sensitivity to 
variations in reinforcement ratios and bias 
unrelated to relative reinforcement, respec- 
tively. The concatenated matching law (Baum 
& Rachlin, 1969) suggests that choice is 
dependent upon the value of the alternatives, 
and that value is determined by the multipli- 
cative effects of any number of reinforcement 
parameters (e.g., rate, magnitude, immediacy 
etc). Generalized concatenated matching thus 
suggests that: 




( 5 ) 


with added terms for reinforcement amounts 
{A1 and A2), reinforcement immediacies (I/D 7 
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and 1 / 7 ) 2 ). and their respective sensitivity 
terms a2 and a3. Davison (1983) suggested 
that the concatenated matching law could be 
used as a basis for a model of concurrent- 
chains performance by replacing rates of 
primary reinforcement (i.e., Ri and with 
rates of transition to the terminal-link stimuli 
and the value of the terminal-link stimuli (i.e., 
conditioned reinforcers). A general form of 
such a model is: 



where rj and r 2 refer to rates of terminal-link 
transition (i.e., rate of conditioned reinforce- 
ment) associated with the options, and Vx and 
V 2 are summary terms describing the value or 
strengthening effects of the two terminal-link 
stimuli. The question then becomes how the 
value or strengthening effects of the condi- 
tioned reinforcers should be calculated. 

According to CCM, the value of a condi- 
tioned reinforcer is a multiplicative function of 
the concatenated effects of the parameters of 
primary reinforcers obtained in the presence 
of a stimulus: 

where all terms are as in Equation 5, and the 
added terms T, and T^ refer to the average 
durations of the terminal and initial links, 
respectively. Thus, CCM suggests that condi- 
tioned reinforcing value is independent of the 
temporal context, but sensitivity to relative 
value of the conditioned reinforcers changes 
with the temporal context. This occurs be- 
cause the Tf /Ti exponent decreases the other 
sensitivity parameters (« 2 . “ 5 ) when relatively 
longer initial-link schedules are arranged. In 
essence, CCM is a restatement of Herrnstein’s 
(1964) original conclusion that the strength- 
ening effects of a conditioned reinforcer are a 
function of primary reinforcement obtained in 
its presence, but modified by the fact that 
temporal context affects sensitivity to such 
strengthening effects. 

Alternatively, HVA suggests that the value of 
a terminal-link stimulus is determined by the 



summed effects of the amounts and delays to 
primary reinforcers obtained in its presence 
and uses Mazur’s (1984) hyperbolic decay 
model to calculate the value of a conditioned 
reinforcer such that: 

where the terms are as above and the 
parameter k represents sensitivity to primary- 
reinforcement delay. Furthermore, HVA sug- 
gests that preference in concurrent chains is 
determined by the increase in value associated 
with a transition to a terminal link from the 
initial links such that: 


Ih W \Vt2-a‘2Vj ^ ’ 

where Vti and I /2 represent the values of each 
of the terminal links, 1/ represents the value 
of the initial links, and «2 is a sensitivity 
parameter scaling initial-link values. Like 
DRT, HVA is similar to comparator theories 
of Pavlovian conditioning, but in the case of 
HVA this is because a terminal stimulus will 
only attract choice if the value of that stimulus 
is greater than the value of initial links. 

To aid in the comparison of DRT, CCM, and 
HVA, consider a generalized-matching version 
of the DRT (cf. Mazur, 2001) with free 
parameters for sensitivity to relative rates of 
primary reinforcement ax, sensitivity to termi- 
nal-link durations U 2 , and sensitivity to relative 
conditioned reinforcement value k, such that: 


Bj jR,yW T-a2h y 
Bi KRz) XT—azh) 


( 10 ) 


Note that DRT, CCM, and HVA each propose 
a different way for calculating the value of 
conditioned reinforcers. Note also that CCM 
and HVA differ from DRT in that DRT 
includes relative rates of primary reinforce- 
ment (Rx/Rz) rather than relative rates of 
conditioned reinforcement (rx/r 2 ). Despite 
these differences in approach, the theories 
all do about equally well accounting for the 
main findings from the large body of research 
on concurrent-chains schedules when equipped 
with the same number of free parameters (see 
Mazur, 2001). 
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Although the study of behavior on concur- 
rent-chains schedules and the associated mod- 
els have no doubt increased our understand- 
ing of conditioned reinforcement, the heavy 
reliance on concurrent chains has been 
limiting with respect to the questions that 
can be answered about conditioned reinforce- 
ment. The vast majority of research on 
concurrent-chains schedules has focused on 
how to characterize the effects of changes in 
primary reinforcement or the effects of tem- 
poral context on conditioned reinforcement 
value. These are surely interesting and impor- 
tant questions, but the dependency noted 
above between responding in the initial links 
and access to primary reinforcers in the 
terminal links has made it difficult to examine 
the effects of parameters of conditioned rein- 
forcement on choice. For example, the most 
straightforward way to vary the rate of condi- 
tioned reinforcement is to modify the dura- 
tion of an initial link. But, doing so also 
changes the rate of primary reinforcement, 
relative value of the conditioned reinforcer, or 
sensitivity to relative value, depending on the 
model. The confound between rates of prima- 
ry and conditioned reinforcement is such a 
prominent feature of the procedure that, as 
noted above, DRT has formalized the effects of 
changes in initial link duration as resulting 
from the associated changes in primary rein- 
forcement and the value of the conditioned 
reinforcer. 

An alternative approach to study condi- 
tioned reinforcement rate in concurrent 
chains involves adding extra terminal-link 
entries associated with extinction and then 
vary their relative rate of production by the two 
initial-link options. Although this approach 
has produced some modest evidence for the 
effects of conditioned reinforcement rate on 
choice (e.g., Williams & Dunn, 1991), the 
procedure is plagued by the transitive nature 
of the effects and interpretive problems 
(Mazur, 1999; see also Shahan, Podlesnik, & 
Jimenez-Gomez, 2006, for discussion). If one’s 
interest is in examining the strengthening 
effects of conditioned reinforcers, this state of 
affairs is unfortunate. The reason is that it 
becomes very difficult to examine the effects of 
variations in parameters of conditioned rein- 
forcement independently of changes in pri- 
mary reinforcement. Thus, our understanding 
of the putative strengthening effects of condi- 



Fig. 2. Schematic of an observing-response procedure 
(W=White, G=Green, R=Red). See text for details. 


tioned reinforcers might benefit from in- 
creased use of procedures allowing better 
separation of the effects of primary and 
conditioned reinforcers. 

One such procedure is the observing-re- 
sponse procedure (Wyckoff, 1952). Figure 2 
shows an example of an observing-response 
procedure. Unsignaled periods of food rein- 
forcement on a VI schedule alternate irregu- 
larly with extinction on the food key (i.e., a 
mixed schedule). Responses on a separate 
observing key produce brief periods of stimuli 
differentially associated with the schedule of 
reinforcement in effect on the food key — 
either VI (i.e., S-I-) or extinction (i.e., S— ). 
Responding on the observing key is widely 
believed to be maintained by the conditioned 
reinforcing effects of S-l- presentations (e.g., 
Dinsmoor, 1983; Fantino, 1977). In fact, S— 
deliveries can be omitted from the procedure 
with little impact if S-l- deliveries are made 
intermittent (e.g., Dinsmoor, Browne, & Lawr- 
ence, 1972) . Importantly, responding on the 
observing key has no effect on the scheduling 
of primary reinforcers on the food key. All 
food-key reinforcers can be obtained in the 
absence of responding on the observing key. 
In addition, unlike chain schedules of rein- 
forcement, parameters of conditioned rein- 
forcement delivery (e.g., rate) can be exam- 
ined across a wide range without affecting 
primary reinforcement rates or conditioned 
reinforcement value. 

In order to study effects of relative condi- 
tioned reinforcement rate on choice in the 
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Obs ( 1 ) 

Obs(R1 

1:9 

VI 100 

VI 11.1 

1:3 

VI 40 

VI 13.3 

1:1 

VI 20 

VI 20 

3:1 

VI 13.3 

VI 40 

9:1 

VI 11.1 

VI 100 


Fig. 3. Schematic of the concurrent observing-re- 
sponse procedure used by Shahan et al. (2006). “Ohs” 
refers to an observing-response key. The box at the bottom 
lists the different ratios of S+ presentation rates examined 
across the conditions of the experiment and VI schedules 
on the left and right observing keys to generate those 
ratios. See text for details. 


absence of changes in rates of primary 
reinforcement and value of the conditioned 
reinforcers, Shahan et al. (2006) used a 
concurrent observing-response procedure (cf. 
Dinsmoor, Mulvaney, & Jwaideh, 1981). Fig- 
ure 3 shows a schematic of the procedure used 
by Shahan et al. On a center key, unsignaled 
periods of a VI 90-s schedule of food rein- 
forcement alternated irregularly with extinc- 
tion. Responses to either the left or the right 
key intermittently produced 15-s S-l- presenta- 
tions when the VI 90 was in effect on the food 
key. Both observing responses produced S-t 
deliveries on VI schedules, and the ratio of S-t 


delivery rates for the two observing responses 
was varied by changing the VI schedules across 
conditions. Thus, assuming that S-l- deliveries 
function as conditioned reinforcers, the rela- 
tive rate of conditioned reinforcement was 
varied across conditions. Importantly, rates of 
primary reinforcement remained unchanged 
across conditions and the value of the S-l- 
deliveries likely remained unchanged (i.e., the 
ratio of S-l- to mixed-schedule food deliveries 
remained unchanged). Despite the lack of 
changes in primary reinforcement rate or 
conditioned reinforcement value, relative rates 
of responding on the observing keys varied as 
an orderly function of relative rates of S-t 
delivery. Furthermore, relative rates of observ- 
ing were well described by the generalized 
matching law when relative rates of condi- 
tioned reinforcement (i.e., ry/ V 2 ) were used 
for the reinforcement ratios. Thus, the data of 
Shahan et al. appear to be consistent with 
general choice models like CCM and FIVA that 
clearly include a role for relative conditioned 
reinforcement rate when the value of the 
conditioned reinforcers remains unchanged, 
and inconsistent with DRT which does not (see 
Fantino & Romanowich, 2007; Shahan & 
Podlesnik, 2008b; for further discussion) . 

If relative allocation of behavior as formal- 
ized in the matching law is accepted as an 
appropriate measure of relative response 
strength, the fact that choice was governed 
by relative rate of S-F deliveries seems consis- 
tent with the notion that conditioned rein- 
forcers function like primary reinforcers. In 
short, all seems well for the notion of 
conditioned reinforcement. Conditioned rein- 
forcers appear to acquire the capacity to 
strengthen responding as a result of their 
association with primary reinforcers, and they 
appear to impact relative response strength in 
a manner consistent with a hallmark quantita- 
tive theory of operant behavior. In addition, 
contemporary choice theories like CCM and 
HVA capture the effects of relative rate of 
conditioned reinforcement and the effects of 
changes in conditioned reinforcement value 
associated with variations in primary reinforce- 
ment and/ or the temporal context. Unfortu- 
nately, all does not seem well when the relative 
strengthening effects of putative conditioned 
reinforcers are considered within the ap- 
proach to response strength provided by 
behavioral momentum theory. 
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Behavioral Momentum and Response Strength 

Behavioral momentum theory (e.g., Nevin & 
Grace, 2000) suggests that response rates and 
resistance to change are two separable aspects 
of operant behavior. The contingent relation 
between responses and reinforcers governs 
response rate in a manner consistent with 
the matching law (e.g., Herrnstein, 1970), but 
the Pavlovian relation between a discriminative 
stimulus and reinforcers obtained in the 
presence of that stimulus governs the persis- 
tence of responding under conditions of 
disruption (i.e., resistance to change). Fur- 
thermore, the theory suggests that resistance 
to change provides a better measure of 
response strength than response rates because 
response rates are susceptible to control by 
operations that may not necessarily impact 
strength (e.g., Nevin, 1974; Nevin & Grace, 
2000). For example, schedules requiring 
paced responding (e.g., differential reinforce- 
ment of low rate behavior versus differential 
reinforcement of high rate behavior) may 
produce differential response rates, but these 
differences may not be attributable to differ- 
ential response strength. 

In the usual procedure for studying resis- 
tance to change, a multiple schedule of 
reinforcement is used to arrange differential 
conditions of primary reinforcement in the 
presence of two components signaled by 
distinctive stimuli. Some disruptor (e.g., ex- 
tinction, presession feeding) is then intro- 
duced and the decrease in response rates 
relative to predisruption response rates pro- 
vides a measure of resistance to change. 
Greater resistance to change (i.e., response 
strength) is evidenced by relatively smaller 
decreases from baseline. Fligher rates of 
primary reinforcement have been found to 
reliably produce greater resistance to change 
(see Nevin, 1992). Furthermore, support for 
the separable roles of response rates and 
resistance to change comes from experiments 
in which the addition of response-indepen- 
dent reinforcers into one component of a 
multiple schedule decreases predisruption 
response rates, but increases resistance to 
change (e.g., Ahearn, Clark, Gardenier, 
Chung, & Dube, 2003; Cohen, 1996; Grimes 
& Shull, 2001; Flarper, 1999; Igaki & Saka- 
gami, 2004; Mace et ak, 1990; Nevin, Tota, 
Torquato, & Shull, 1990; Shahan & Burke, 
2004). This outcome is consistent with the 


expectations of the theory because the 
inclusion of the added response-independent 
reinforcers degrades the response-reinforcer 
relation, but improves the stimulus-reinforc- 
er relation by increasing rate of reinforce- 
ment obtained in the presence of the 
discriminative-stimulus context. 

The relation between relative resistance to 
change and relative primary reinforcement 
rate obtained in the presence of two stimuli is 
well described by a power function such that: 


where mi and m 2 are the resistance to change 
of responding in stimuli 1 and 2, and Ri and 
R 2 refer to the rates of primary reinforcement 
delivered in the presence of those stimuli 
(Nevin, 1992). The parameter b reflects 
sensitivity of ratios of resistance to change to 
variations in the ratio of reinforcement rates in 
the two stimuli, and is generally near 0.5 
(Nevin, 2002) . Thus, as with the matching law, 
relative response strength is a power function 
of relative reinforcement rate, but in the case 
of behavioral momentum theory, resistance to 
change provides the relevant measure of 
response strength. 

Behavioral Momentum and Conditioned 
Reinforcement 

Unlike the well-developed extensions of the 
matching law to conditioned reinforcement 
using concurrent-chains procedures, relatively 
little work has been conducted extending 
insights about response strength from behav- 
ioral momentum theory to conditioned rein- 
forcement. Few experiments have examined 
resistance to change of responding main- 
tained by conditioned reinforcement. Presum- 
ably if conditioned reinforcers acquire the 
ability to strengthen responding in a manner 
similar to primary reinforcers, then condi- 
tioned reinforcers should similarly increase 
resistance to change. 

A fairly large body of early work with simple 
chain schedules demonstrated that respond- 
ing in initial links tends to be more easily 
disrupted than responding in terminal links 
(see Nevin, Mandell, & Yarensky, 1981, for 
review) . Assuming that responding in the 
initial links of chain schedules reflects the 
strengthening effects of the terminal-link 
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stimuli, this result might he interpreted to 
suggest that responding maintained by condi- 
tioned reinforcement is less resistant to 
change than responding maintained by pri- 
mary reinforcement. Such an outcome is not 
unexpected given that any capacity to rein- 
force shown by conditioned reinforcers should 
be derived from primary reinforcers, and thus 
is likely weaker. But, with the exception of this 
apparent difference in the relative strengthen- 
ing effects of conditioned and primary rein- 
forcers, this early research provided little 
information about the impact of conditioned 
reinforcement on resistance to change. 

Like the vast majority of research on choice 
using concurrent-chains schedules, some re- 
search on resistance to change has examined 
how parameters of primary reinforcement 
occurring in the presence of a conditioned 
reinforcer affect resistance to change of 
responding maintained by that conditioned 
reinforcer. Nevin et al. (1981) examined 
resistance to change of responding in a 
multiple schedule of chain schedules. The 
two components of the multiple schedule 
arranged alternating periods of two-link chain 
random-interval (Rl) schedules using different 
stimuli for the initial- and terminal-link stimuli 
in the two components. The initial links in 
both components of the multiple schedule 
were always Rl 40-s schedules, but the terminal 
links differed either in terms of the rate or 
magnitude of primary reinforcement. Thus, 
the arrangement resembled the usual multiple 
schedule of reinforcement used in behavioral 
momentum research, but allowed compari- 
sons of resistance to change of responding in 
the initial links. As a result, the effects of 
variations in the parameters of the primary 
reinforcers in the two terminal links could be 
examined in terms of their effects on resis- 
tance to change of responding producing 
those terminal links (i.e., conditioned rein- 
forcers). As is true with preference in concur- 
rent-chains schedules, Nevin et al. found that 
response rates and resistance to change of 
responding in an initial link were greater with 
higher rates or larger magnitudes of primary 
reinforcement in a terminal link. Nonetheless, 
as with the use of concurrent-chains schedules 
to study the effects of conditioned reinforce- 
ment on preference, the dependency between 
responding in the initial links and access to 
the primary reinforcer in the terminal links 


ICI 
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Observing Food 
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Fig. 4. Schematic of the multiple schedule of ohserv- 
ing-response procedures used by Shahan and Podlesnik 
(2005). ICIs refer to intercomponent interval. Other 
details available in the text. 


makes it difficult to know the relative contri- 
butions of conditioned and primary reinforce- 
ment to initial-link resistance to change. 

Accordingly, Shahan, Magee, and Dobber- 
stein (2003) used an observing-response pro- 
cedure to examine resistance to change of 
responding maintained by a conditioned 
reinforcer. Like Nevin et al. (1981), they 
examined the effects of rate of primary 
reinforcement obtained during a conditioned 
reinforcer on resistance to change of respond- 
ing maintained by that conditioned reinforcer. 
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Specifically, Shahan et al. arranged a multiple 
schedule of observing-response procedures in 
two experiments. The procedure from Exper- 
iment 2 is depicted in Figure 4. The two 
components of the multiple schedule each 
arranged an observing-response procedure 
using different stimuli for the mixed schedule, 
S-f, and S— . The components were alternately 
presented for 5 min at a time, and were 
separated by an intercomponent interval. 
Observing responses in both components 
produced the schedule-correlated stimuli on 
an RI 15-s schedule. In the Rich component, 
an RI 30 schedule of food reinforcement 
alternated with extinction on the food key, 
and in the Lean component an RI 120 
schedule of food reinforcement alternated 
with extinction. Thus, observing in the Rich 
component produced an S-l- associated with a 
fourfold higher rate of primary reinforcement. 
Consistent with initial-link responding in the 
Nevin et al. (1981) multiple chain-schedule 
experiment, observing rates and resistance to 
change were greater in the Rich component 
than in the Lean component. Furthermore, 
Shahan et al. noted that even though respond- 
ing on the observing key was less resistant to 
change than responding on the food key (also 
consistent with previous chain-schedule data) , 
sensitivity parameters (i.e., b in Equation 1) for 
both observing and food-key responding were 
near the typical value in previous behavioral 
momentum research (i.e., 0.5). Thus, respond- 
ing maintained by a conditioned reinforcer 
was affected by the rate of primary reinforce- 
ment obtained in its presence in a manner 
similar to responding maintained directly by 
the primary reinforcer. These data led Shahan 
et al. to conclude that the strengthening 
effects of a stimulus as a conditioned reinforc- 
er (as measured with resistance to change) 
depend upon the rate of primary reinforce- 
ment obtained in the presence of that stimu- 
lus. This conclusion is obviously the same as 
the conclusion reached by Herrnstein (1964) 
based on concurrent chains data, and formal- 
ized in CCM. 

At this point, there seems to be satisfying 
integration of findings across different do- 
mains examining the strengthening effects of 
stimuli associated with primary reinforcers on 
choice and resistance to change. Grace and 
Nevin (1997) noted that preference for a 
terminal-link stimulus and resistance to 


change of responding in the presence of that 
stimulus are correlated (see also Nevin & 
Grace, 2000). Thus, relatively better stimu- 
lus-reinforcer relations not only increase the 
strength of responding that occurs in their 
presence, they also generate preference for 
behavior that produces them. This outcome 
led Nevin and Grace (2000) to conclude that 
the relative strengthening effects of a stimulus 
as measured by choice and relative resistance 
to change of responding in the presence of 
that stimulus are reflections of a single central 
construct (Grace & Nevin, 1997; Nevin & 
Grace, 2000). Importantly, this underlying 
construct appears to be a result of the 
Pavlovian stimulus-reinforcer relation charac- 
terizing the relevant stimulus. The findings of 
Nevin et al. (1981) and Shahan et al. (2003) 
further support the notion of such a central 
construct by showing that not only do stimuli 
with a better Pavlovian stimulus-reinforcer 
relation with a primary reinforcer attract 
preference and increase resistance to change 
of responding in their presence, they also 
increase resistance to change of responding 
that produces them. Thus, we have a consis- 
tent account of the strengthening effects of 
conditioned reinforcers on responding mea- 
sured by both preference and resistance to 
change of behavior that produces them, and 
the effects of those conditioned reinforcers on 
the strength of responding occurring in their 
presence. Interestingly, the expression used 
for the Pavlovian stimulus-reinforcer relation 
providing the basis of behavioral momentum 
theory (see Nevin, 1992) is the rate of 
reinforcement in the presence of a stimulus 
relative to the overall rate of reinforcement 
within an experimental session, or in other 
words, a re-expression of the cycle-to-trial ratio 
of Scalar Expectancy Theory (Gibbon & 
Balsam, 1981). Thus, it appears that we have 
an integrative account of various strengthen- 
ing effects of food-associated stimuli that 
appears appropriately grounded in an influ- 
ential comparator account of Pavlovian condi- 
tioning. 

The integration above, however, is based 
entirely on differences in primary reinforce- 
ment experienced in the presence of a stimu- 
lus. The concurrent observing-response data of 
Shahan et al. (2006) can be considered an 
extension of the generality noted above by 
showing that the relative frequency of presen- 


278 


TIMOTHY A. SHAHAN 


tation of a food-associated stimulus also impacts 
choice when the rate of primary reinforcement 
in that stimulus remains unchanged. Thus, the 
relative strengthening effects of food-associated 
stimuli depend upon the Pavlovian relation 
between food and the stimuli, and variations in 
the relative frequency of such stimuli earned by 
two responses produces an outcome consistent 
with what we would expect if the stimuli were 
functioning as reinforcers. If such stimuli are 
conditioned reinforcers and strengthen re- 
sponding that produces them, then variations 
in conditioned reinforcement rate should also 
impact resistance to change in a manner 
consistent with their effects on choice noted 
by Shahan et al. (2006) . The useful properties 
of the observing-response procedure noted 
above also allow one to address this apparently 
straightforward question. Surprisingly though, 
variations in relative rate of putative condi- 
tioned reinforcers do not appear to affect 
response strength as measured by resistance 
to change. 

In two experiments, Shahan and Podlesnik 
(2005) used a multiple schedule of observing 
response procedures to examine the effects of 
rate of conditioned reinforcement on resis- 
tance to change. The procedure was like that 
in Figure 4, but the food keys in the two 
components arranged the same rate of prima- 
ry reinforcement by using the same value for 
the RI schedule of food delivery. Most impor- 
tantly, the two components delivered different 
rates of conditioned reinforcement by arrang- 
ing different RI schedules on the observing 
key. Experiment I arranged a 4:1 ratio of 
conditioned reinforcement rates by using RI 
15 and RI 60 schedules for the Rich and Lean 
observing components, respectively. Experi- 
ment 2 arranged a 6:1 ratio of conditioned 
reinforcement rates by using RI 10 and RI 60 
schedules for the Rich and Lean observing 
components, respectively. Consistent with the 
concurrent observing data of Shahan et al. 
(2006), observing rates in both experiments 
were higher in the component associated with 
a higher rate of S-l- deliveries. In that sense, 
one might conclude that S-l- deliveries served 
as conditioned reinforcers. Elowever, in Ex- 
periment 1, there was no difference in 
resistance to presession feeding or extinction 
in the Rich and Lean components. In Exper- 
iment 2, resistance to change was actually 
somewhat greater in the component with the 


lower rate of S-l- delivery. Thus, despite the fact 
that higher rates of a putative conditioned 
reinforcer generated higher response rates, 
they did not increase resistance to change of 
responding. This is not what one would expect 
if the S-t deliveries had indeed acquired the 
capacity to strengthen responding. 

Shahan and Podlesnik (2008a) used a 
multiple schedule of observing-response pro- 
cedures to ask a similar question about the 
effects of value of a conditioned reinforcer on 
resistance to change in three experiments. 
The first two experiments placed conditioned 
reinforcement value and primary reinforce- 
ment rate in opposition to one another and 
the third experiment arranged different val- 
ued conditioned reinforcers, but the same 
overall rate of primary reinforcement. Briefly, 
Experiment 1 decreased the value of S-f 
presentations in one component by degrading 
the relation between S-t and food deliveries via 
additional food deliveries uncorrelated with 
S-t. Experiment 2 increased the value of S-t in 
one component by increasing the probability 
of an extinction period on the food key, thus 
decreasing the rate of primary reinforcement 
during the mixed schedule and increasing the 
improvement in reinforcement rate signaled 
by S-t. Experiment 3 was similar to Experiment 
2 except that the rate of primary reinforce- 
ment during S-t was increased in the higher- 
value component to compensate for the 
primary reinforcers removed during the mixed 
schedule and to equate overall primary rein- 
forcement rate in the two components. The 
data showed that, as expected, response rates 
were higher in the component in which 
observing produced an S-t with a higher value. 
One might conclude based on response rates 
alone that higher valued S-t deliveries are 
more potent conditioned reinforcers than 
lower valued S-t deliveries. However, despite 
the higher baseline response rates, resistance 
to change was not affected by the value of S-t 
deliveries. Thus response strength as mea- 
sured by resistance to change appears not to 
be affected by the value of a conditioned 
reinforcer. 

Shahan and Podlesnik (2008b) provided a 
quantitative analysis of all the resistance to 
change of observing experiments described 
above in order to explore how conditioned 
and primary reinforcement contributed to 
resistance to change. When considered togeth- 
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er, the six experiments provided a fairly wide 
range in variables that might contribute to 
resistance to change of observing. Separate 
analyses of relative resistance to change of 
observing were conducted as a function of 
relative rate of primary reinforcement during 
S+, relative rate of S+ delivery (i.e., condi- 
tioned reinforcement rate) , relative value of S-l- 
(S-l- food rate/mixed-schedule food rate), 
relative overall rate of primary reinforcement 
in the component, and relative rate of primary 
reinforcement in the presence of the mixed- 
schedule stimuli. There was no meaningful 
relation between relative resistance to change 
of observing and any of the variables except 
for relative rate of primary reinforcement in 
the presence of the mixed-schedule stimuli. 
Interestingly, the mixed-schedule stimuli pro- 
vide the context in which observing responses 
actually occur. Thus, relative resistance to 
change of observing appears to have been 
determined by the rates of primary reinforce- 
ment obtained in the context in which 
observing was occurring. Although parameters 
of conditioned reinforcement had systematic 
effects on rates of observing, they had no 
systematic effect on response strength of 
observing as measured by resistance to change. 
If resistance to change is accepted as a more 
appropriate measure of response strength 
than response rates, then stimuli normally 
considered to function as conditioned rein- 
forcers (i.e., S-l- deliveries) do not appear to 
impact response strength in the same way as 
primary reinforcers. 

Is Conditioned “Reinforcement” a Misnomer? 

Based only on the resistance to change of 
observing findings above, one could be forgiven 
for dismissing any suggestion that conditioned 
reinforcers do not acquire the capacity to 
strengthen behavior. The procedures are com- 
plex, the experiments have all been conducted 
in one laboratory, and the interpretation 
hinges on accepting resistance to change as 
the appropriate measure of response strength. 
Nonetheless, the findings of those experiments 
have led me to reconsider other long-standing 
assertions that conditioned reinforcers do not 
actually strengthen behavior. 

Perhaps the most cited threat to the notion 
that stimuli associated with primary reinforcers 
themselves come to strengthen behavior is a 
series of experiments by Schuster (1969). In a 


concurrent-chains procedure with pigeons, 
Schuster arranged equal VI 60-s initial links 
that both produced VI 30-s terminal links in 
which a brief stimulus preceded food deliver- 
ies. In one terminal link, additional presenta- 
tions of the food-paired stimulus were present- 
ed on a fixed-ratio (FR) 11 schedule. Response 
rates in the terminal link with the additional 
food-paired stimuli were higher than in the 
terminal link without the stimuli. Based on 
these higher response rates alone, one might 
conclude that the food-paired stimulus func- 
tioned as a conditioned reinforcer in the 
traditional sense. If the stimulus presentations 
were reinforcers, one might also expect that 
they would produce a preference for the 
terminal link in which they occurred, much 
like the addition of primary reinforcers would. 
But Schuster found that, if anything, there was 
a preference for the terminal link without the 
added stimuli. In a similar arrangement in a 
multiple schedule, Schuster found that added 
response-dependent presentations of the food- 
paired stimulus in one component increased 
response rates, but failed to produce contrast, 
an effect that was obtained when primary 
reinforcers were added. These results led 
Schuster and many subsequent observers 
(e.g., Rachlin, 1976) to conclude that response 
rate increases produced by the paired stimuli 
were a result of some process other than 
reinforcement. A similar interpretation may be 
applied to the concurrent observing data of 
Shahan et al. (2006) and the resistance to 
change data of Shahan and Podlesnik (2005, 
2008a,b). The higher response rates obtained 
with more frequent or higher valued S-l- 
presentations may reflect some process other 
than strengthening by reinforcement. 

Although many interpretations have been 
offered as alternatives to a reinforcement-like 
strengthening process, a common suggestion 
is that stimuli predictive of primary reinforcers 
function as signals for primary reinforcers and 
thus serve to guide rather than strengthen 
behavior (e.g., Bolles, 1975; Davison & Baum, 
2006; Longstreth, 1971; Rachlin, 1976; Stad- 
don, 1983; Wolfe, 1936). Although many 
names might be used to refer to stimuli in 
such an account (e.g., discriminative stimuli, 
signals, feedback, etc.), for reasons that will be 
discussed later, I am partial to “signposts” or 
“means to an end” as suggested by Wolfe 
(1936), Longstreth (1971), and Bolles (1975). 
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In what follows, I will not attempt to show how 
a signpost account may be applicable to the 
vast array of data generated under the rubric 
of conditioned reinforcement. What I will do 
is briefly discuss some examples raising tbe 
possibility that a signpost or means-to-an end 
account is plausible. 

Consider a set of experiments by Bolles 
(1961). In one experiment, rats responded on 
two concurrently available levers tbat both 
intermittently produced food pellets and an 
associated click. The scheduling of pellet 
deliveries was such that receipt of a pellet on 
one lever was predictive of additional pellets 
on that lever. During extinction, both levers 
were present, but only one intermittently 
produced the click. As might be expected if 
the click were a conditioned reinforcer, the 
rats showed a shift in preference for the lever 
that produced the click during extinction. In a 
second experiment, however, the scheduling 
was such that a pellet on one lever was 
predictive of subsequent pellets on the other 
lever. During extinction with the click avail- 
able for pressing one of tbe levers, the rats 
showed a shift in preference away from the 
lever that produced the click. Based on this 
outcome, Bolles concluded that the food- 
associated click commonly used in early 
experiments on conditioned reinforcement 
might not be a reinforcer at all, but merely a 
signal for how food is to be obtained. 

More recently, Davison and Baum (2006) 
reached the same conclusion as Bolles using a 
related procedure with pigeons. They used a 
frequently changing concurrent schedules 
procedure in which the relative rates of 
primary reinforcement varied across un- 
signaled components arranged during the 
session. In the first experiment, a proportion 
of the food deliveries were replaced with 
presentation of the food-magazine light alone. 
The magazine light was paired with food 
deliveries and could reasonably be expected 
to function as a conditioned reinforcer. 
Previous work witb similar frequently changing 
concurrent schedule procedures has shown 
that the delivery of primary reinforcers pro- 
duces a pulse of preference to the option that 
produced them (Davison & Baum, 2000) . In 
the Davison and Baum (2006) experiment, 
both food deliveries and magazine-light deliv- 
eries produced preference pulses at the option 
that produced them, but the pulses produced 


by magazine lights tended to be smaller. This 
outcome is consistent with what might be 
expected if tbe stimuli were functioning as 
conditioned reinforcers. However, it is impor- 
tant to note that because the magazine-light 
presentations replaced food deliveries on an 
option, the ratio of magazine light deliveries 
on the two options was perfectly predictive of 
the ratio of food deliveries arranged. Thus, as 
in Bolles (1961), the stimulus presentations 
were also a signal for where food was likely to 
be found. In a second experiment, Davison 
and Baum (2006) explored the role of such a 
signaling function by arranging for correla- 
tions of -1-1, 0, or —1 between stimulus 
presentations and food deliveries across con- 
ditions. They also examined whether the 
pairing of the stimulus with food deliveries 
was important by arranging similar relations 
with an unpaired keylight. They found that the 
pairing of the stimulus with food did not 
matter, but that the correlation of the stimulus 
with the location of food did matter. As in the 
Bolles experiments, if the stimulus predicted 
more food on an option, the preference pulse 
occurred on that option, but if tbe stimulus 
predicted food on the other option the pulse 
occurred on that other option. Based on this 
outcome, Davison and Baum also suggested 
that all conditioned reinforcement effects are 
really signaling effects. 

The experiments of Bolles (1961) and 
Davison and Baum (2006) above are provided 
as examples of how one might view food- 
associated stimuli as signposts for food. Stimuli 
associated with primary reinforcers by defini- 
tion predict when and where that primary 
reinforcer is available. A signpost-based account 
suggests tbat wben a response produces a 
stimulus associated witb a primary reinforcer, 
the stimulus serves to guide the animal to the 
primary reinforcer by providing feedback about 
how or where the reinforcer is to be obtained. 
Responding that produces a signpost might 
occur not because tbe signpost strengthens the 
response in a reinforcement-like fashion, but 
because production of the signpost is useful for 
getting to the primary reinforcer. This is the 
sense in which Wolfe (1936), Longstreth 
(1971), and Bolles (1975) suggested that 
signposts might also be thought of as means 
to an end for acquiring primary reinforcers. 

To explore the notion that stimuli associat- 
ed with primary reinforcers might be thought 
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of as means to an end, consider an experiment 
by Shahan and Jimenez-Gomez (2006) exam- 
ining alcohol-associated stimuli with rats. An 
observing-response procedure arranged alter- 
nating periods of unsignaled extinction and a 
random ratio (RR) 25 schedule of alcohol 
solution delivery on one lever. Responses on a 
separate (i.e., observing) lever produced 15-s 
presentations of stimuli correlated with the RR 
25 (i.e., S-t) and extinction (i.e., S— ) periods. 
Across conditions, alcohol concentrations of 
2%, 5%, 10%, and 20% were examined with a 
fixed 0.1-ml volume of alcohol deliveries across 
conditions. A robust finding with a variety of 
self-administered drugs is that overall drug 
consumption increases as a function of the 
dose of each drug delivery, but the number of 
drug deliveries earned first increases and then 
decreases as a function of dose (see Griffiths, 
Bigelow, Henningfield, 1980, for review). The 
reason for this outcome is thought to be that 
lower doses are poorer reinforcers, producing 
fewer deliveries and lower total consumption. 
As the dose increases, drug deliveries become 
more potent reinforcers, but fewer of them are 
required to achieve larger total amounts of 
drug consumption and satiation. Thus, unlike 
the typical arrangement with food reinforcers, 
variations in the magnitude of drug reinforc- 
ers in self-administration preparations often 
span the range from a relatively ineffective 
reinforcer to one that produces satiation and 
fewer total reinforcer deliveries in a session. 

The question addressed by Shahan and 
Jimenez-Gomez (2006) was how such varia- 
tions in the unit dose of an alcohol reinforcer 
would affect responding maintained by a 
conditioned reinforcer associated with the 
availability of the alcohol deliveries. Given that 
alcohol is the functional reinforcer in the 
solutions, one might reasonably expect that an 
S-t associated with higher concentrations 
would support more observing. The reason is 
that the S-l- is associated with a greater 
magnitude of primary reinforcement and 
higher overall consumption of the primary 
reinforcer. On the other hand, as a means to 
an end, S-l- deliveries might track the number 
of needed dipper deliveries instead of overall 
alcohol consumption. Figure 5 shows the data 
from the relevant conditions. Clearly, the 
number of stimulus presentations earned 
tracked the number of dipper deliveries 
earned, ratber tban overall alcohol consump- 



Fig. 5. Effects of concentration of an orally self- 
administered alcohol solution on the number of alcohol 
dipper and stimulus presentations (left y-axis) and g/kg 
ethanol delivered (right y-axis) in the observing-response 
procedure of Shahan and Jimenez-Gomez (2006) . Data are 
means (± ISEM) for 7 rats. Reprinted with permission of 
Behavioural Pharmacology. 

tion. Although the number of stimulus pre- 
sentations earned is depicted in tbe figure, 
because data from an FRl schedule of observ- 
ing are presented, observing-response rates 
showed the same pattern. A similar pattern was 
obtained with a range of other FR schedule 
values on the observing lever (data not shown 
here). Thus, observing maintained by the 
alcohol-associated stimulus presentations 
might be thought of as a means to acquire 
dippers of alcohol, rather than as being 
strengthened by presentations of alcohol- 
associated stimuli. When fewer dippers are 
needed, fewer stimulus presentations are also 
needed. 

Viewing stimuli associated with primary 
reinforcers as a means to an end highlights 
the relation between such stimuli and tokens. 
As noted by Flackenberg (2009), “A token is 
an object or symbol that is exchanged for 
goods and services.” In fact, it was research on 
tokens with chimpanzees that led Wolfe 
(1936) to suggest a means-to-an-end interpre- 
tation in the first place. Tokens are easily 
viewed as a means to an end for primary 
reinforcers, because that is precisely what they 
are, a medium of exchange. Obviously, things 
are somewhat less clear with stimuli produced 
by observing responses, but the data from the 
Shahan and Jimenez-Gomez experiment sug- 
gest that the economic aspects of the situation 
might play a role. The use of a drug reinforcer 
in Shahan and Jimenez-Gomez is important 
because drug reinforcers often result in fairly 
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large changes in motivation as a result of rapid 
satiation, dependent upon the dose as noted 
above. This aspect of drug reinforcers is 
precisely the reason that regulation-based 
economic theoretical approaches have been 
so popular and useful in the drug self- 
administration literature (e.g., Bickel, De- 
Grandpre, & Higgins, 1993; Hursh, 1991). 
Such economic considerations also may help 
to put the role of stimuli associated with 
primary reinforcers into perspective — they 
are useful for obtaining primary reinforcers 
in a manner that is likely affected by the 
economic or regulatory circumstances of the 
primary reinforcer. Although tokens are often 
viewed as conditioned reinforcers in addition 
to being a medium of exchange, the evidence 
for response-strengthening effects of tokens 
above and beyond their role as a means to an 
end or signposts is fairly weak (see Hacken- 
berg, 2009, for review). Unfortunately, rela- 
tively little contemporary research has been 
conducted on token-maintained behavior, and 
strangely, even less work has been conducted 
integrating tokens into modern regulation- 
based behavioral economic accounts. Work by 
Hackenberg and colleagues (see Hackenberg, 
2009) is helping to fill this gap and could lead 
to a more systematic and internally consistent 
economic-based account of what we once 
called conditioned reinforcers. In short, al- 
though tokens are often viewed as conditioned 
reinforcers, it might be useful instead to 
consider viewing conditioned reinforcers as 
tokens (i.e., means to an end). 

If the stimuli in our observing experiments 
are to be viewed as signposts and means to an 
end, one immediate concern is that unlike 
tokens or terminal-link stimuli in chain sched- 
ules, S-l- presentations are not required to 
obtain primary reinforcers. This aspect of the 
observing-response procedure is precisely why 
it was used in the experiments discussed above. 
But, if one assumes that evolution has 
equipped organisms to follow predictive stim- 
uli in order to obtain food, it is not too 
surprising that behavior that produces such 
stimuli would occur in an observing-response 
procedure. S-l- presentations might be consid- 
ered a means to an end because they are 
instrumental for guiding organisms to the 
richer than average patches of primary rein- 
forcement they signal. Since they were first 
introduced, observing responses have been 


considered a proxy for looking at or for 
important environmental stimuli (Wyckoff, 
1952; see also Dinsmoor, 1985). In order for 
behavior to be appropriately allocated to the 
periods of reinforcement and nonreinforce- 
ment arranged by a multiple schedule, an 
organism must be in contact with the relevant 
stimuli. When such stimuli are removed and 
then made contingent upon an arbitrary 
response (i.e., an observing response), that 
arbitrary response functionally becomes part 
of the looking or attending to the relevant 
stimuli. Although S-l- presentations are not 
required to obtain primary reinforcers, such 
looking or attending is required for ultimate 
effective action with respect to the primary 
reinforcer. That is the sense in which one 
might say the organism is using observing 
responses and the stimuli they produce as a 
means to an end. 

Clearly, the interpretation above focusing 
on the utility of S-l- deliveries and the more 
general signposts or means-to-an-end ap- 
proach being explored is related to the 
information or uncertainty-reduction hypoth- 
esis of conditioned reinforcement. What 
might be called the strong uncertainty reduc- 
tion hypothesis suggests that a reduction of 
uncertainty per se is reinforcing (e.g., Hendry, 
1969). In the observing-response procedure, 
observing responses reduce uncertainty by 
producing stimuli correlated with the condi- 
tions of primary reinforcement in effect. 
Importantly, both S-l- and S— reduce uncer- 
tainty by the same amount (i.e., 1 bit), and 
should both function as reinforcers. Contrary 
to this suggestion, a considerable amount of 
research has shown that S— presentations 
alone are not reinforcing, at least with 
pigeons, thus posing a serious challenge to 
the strong uncertainty hypothesis (see Dins- 
moor, 1983, for review). Having said that, 
there is some compelling evidence that S— can 
function as a reinforcer for adult humans 
under some circumstances (e.g., Perone & 
Kaminski, 1992). 

Regardless, the fact that S— alone typically 
fails to function as a reinforcer misses the 
point of the signpost or means-to-an-end 
account being explored here. A signpost 
account need not assert that either S-l- or S— 
function as a reinforcer. Signposts guide 
behavior and earning them is instrumental as 
a means to an end with respect to effective 
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action related to the primary reinforcer. Such 
an account is not based on negative reinforce- 
ment produced by a reduction in the aversive 
properties of uncertainty nor the positive 
conditioned reinforcing effects of S-f. It is 
not at all surprising that a response producing 
S— alone does not continue in most of the 
research examining the reinforcing effects of 
S— . The procedures are rather restrictive and 
leave no way for the subjects to better their 
situation. Typically, subjects have nothing to 
do but wait out the S— period. To use the 
metaphor of a signpost, imagine riding in a car 
and having no control over the direction it is 
headed. Looking out the window, you see signs 
revealing that you are headed in the wrong 
direction. How long would you continue 
looking for or at the signs? Clearly the signs 
are informative, but they are not useful. If on 
the other hand, you could change the direc- 
tion of the car when error-revealing signs are 
encountered, you might be more likely to 
continue looking. The typical S— alone 
procedure used in observing experiments is 
more similar to the former than the latter case. 
A signpost that tells you that you are headed in 
the wrong direction is a signal to change 
directions. If you cannot change directions, 
the signpost is not useful. 

It is well established that when organisms are 
allowed to control the duration of S-l- and S— , 
they tend to maintain S-l- and to terminate S— 
(see Dinsmoor, 1983, for review). Similarly, 
there is evidence that organisms prefer an 
observing response that produces S-l- alone to 
one that produces S-t and S— (Mulvaney, 
Dinsmoor, Jwaideh, & Hughes, 1974). The 
usual interpretation of these findings is that 
the S— is aversive and functions as a condi- 
tioned punisher of observing. An interpretation 
based on a signposts approach suggests that the 
S— presentations serve as a signal to change 
course and do something else. Thus, S— might 
be seen as a signal to go in another direction by 
disengaging from an observing response that 
continues to lead in the wrong direction. If, on 
the other hand, S— is permitted to be useful by 
allowing a period of rest during a difficult task 
or providing time to do something else useful, 
responding that produces S— is maintained 
(e.g.. Case, Fantino, & Wixted, 1985; Case, 
Ploog, & Fantino, 1990; Perone & Baron, 1980) . 
It is worth noting that the version of the 
information hypothesis originally proposed by 


Berlyne (1956) is much more in line with this 
notion of the usefulness of stimuli than with the 
strong uncertainty hypothesis requiring that an 
S— must always maintain responding (see 
Lieberman, Cathro, Nichol, & Watson, 1997, 
for discussion). 

At this point, I suspect many readers will not 
have fundamentally changed their minds 
about whether stimuli associated with primary 
reinforcers strengthen behavior that produces 
them. This is especially likely given that in 
previous reviews of the conditioned reinforce- 
ment literature, both Fantino (1977) and 
Williams (1994a,b) concluded that an account 
based on conditioned strengthening effects is 
to be preferred over alternative accounts (e.g., 
uncertainty reduction, marking, bridging). 
Fantino (1977) based his conclusion largely 
on the failures of strong uncertainty reduction 
noted above. But, strong uncertainty reduction 
has always served as something of a straw man 
for a more nuanced informational account 
(Berlyne, 1956). Hopefully the discussion 
above has made it clear that rejection of 
strong uncertainty reduction does not inexo- 
rably lead to acceptance of response strength- 
ening by conditioned reinforcement as the 
only remaining alternative. Williams (1994a,b) 
concludes in favor of strengthening by condi- 
tioned reinforcement based largely on the 
effects of food-paired stimuli during delays to 
reinforcement in discrimination procedures 
(e.g., Cronin, 1980) . The data from such delay- 
to-reinforcement procedures do appear to 
show that alternative concepts like marking 
and bridging cannot account for all putative 
conditioned reinforcement effects. But, ruling 
out marking and bridging is not the same 
thing as showing that the remaining effects 
must be due to a reinforcement-like strength- 
ening processes. Williams (1994a,b) places 
considerable weight on Cronin’s finding that 
pigeons will choose an option that immediate- 
ly produces a stimulus paired with food at the 
end of a 1-min delay over an option that 
produces a stimulus normally paired with the 
absence of food at the end of the delay, 
despite the fact that doing so decreases the 
overall rate of food obtained. The fact that 
animals sometimes behave suboptimally in the 
long run as a result of being misguided by 
stimuli normally predictive of food delivery is 
not terribly surprising when the long-term 
consequences occur after considerable delays. 
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Again, this outcome is not evidence that such 
immediate effects of stimuli must be due to 
the reinforcement-like strengthening effects of 
food-associated stimuli. Unfortunately, empir- 
ically differentiating between a reinforcement- 
like strengthening account and a signpost or 
means-to-an-end account is difficult under 
these and many other conditions. My inten- 
tion here is not to assert that the available data 
force acceptance of a signposts account and 
rejection of a conditioned reinforcement 
account. Rather, the point is that a signposts 
account is worth considering, and there is 
ample room for differences in perspective to 
guide interpretation of the existing data. 

Regardless, in considering whether stimuli 
associated with primary reinforcers strengthen 
responding that produces them, it is not 
unreasonable to ask the related question of 
whether primary reinforcers themselves 
strengthen behavior that produces them. 
Certainly if primary reinforcers do not 
strengthen behavior, any discussion of wheth- 
er conditioned reinforcers strengthen behav- 
ior would be pointless. As discussed below, the 
possibility that primary reinforcers also might 
not strengthen behavior appears to be worthy 
of consideration. 

Do Primary Reinforcers Strengthen Behavior? 

In their influential treatment of Pavlovian 
and operant conditioning, Gallistel and Gib- 
bon (2002) argue that strengthening by a US 
or primary reinforcer is not the process 
underlying conditioning. As an alternative, 
they suggest organisms learn what, where, 
and that events occur in the environment 
and that they compare events, durations, and 
rates to make threshold-based choices. Thus, 
their alternative to strengthening is a compar- 
ator approach based largely on Scalar Expec- 
tancy Theory augmented by a host of more 
specific models for calculating the relevant 
quantities to be compared. Although one 
might quibble with the cognitive flavor of the 
statistical-decision-theory-based modeling ap- 
proach, it does not change the fact that 
organisms might learn about the environment 
and make comparisons rather than having 
responses strengthened by reinforcers via back 
propagation. Gallistel and Gibbon present and 
reinterpret a wealth of data to support the 
feasibility of their general approach. A set of 
experiments by Gole, Barnet, and Miller 


(1995) provide an example that is particularly 
challenging for a response-strengthening ac- 
count of conditioning. 

Gole et al. (1995) examined the role of 
temporal-map learning in Pavlovian condition- 
ing. In the relevant portions of the experi- 
ments, fear conditioning was compared for 
two groups of rats. The first group received a 5- 
s tone followed immediately by a shock 
presentation (i.e., standard delay condition- 
ing). The second group received a 5-s tone 
followed by a 5-s trace interval without the tone 
present and then the shock (i.e., standard 
trace conditioning). Not surprisingly, when a 
subset of control rats from each group was 
tested with the tone alone, the tone was a less 
effective fear-GS for the group with the trace 
interval. The standard interpretation of such 
an effect of a trace interval is that the lack of 
contiguity between the GS and US has disrupt- 
ed the strengthening effects of the GS-US 
bond or the transfer of value from the US to 
the GS. In the important comparison, the 
remaining rats from the delay and trace 
conditioning groups were exposed to a back- 
ward second-order conditioning phase. Specif- 
ically, the original 5-s tone GS was then 
followed by a 5-s clicking stimulus and the 
US was not presented. This is second-order 
conditioning because the original GS was 
being paired with the novel clicking stimulus. 
It is backward second-order conditioning 
because the original GS preceded rather than 
followed the novel stimulus, as is usually the 
case in second-order conditioning. The results 
showed relatively little conditioning to the 
novel clicking stimulus for the delay-condi- 
tioning animals, but strong conditioning for 
the trace-conditioning animals. This result is 
surprising from a traditional strengthening 
account for a couple of reasons. First, the 
original tone GS for the trace conditioning 
group should have been strengthened less 
than for the delay-conditioning group. Thus, it 
should have had less value or ability to 
strengthen the novel GS in the second-order 
phase of the experiment. Second, the second- 
order phase of the experiment involved a 
backward conditioning procedure in which 
one would traditionally expect little strength- 
ening to occur because the order of events is 
in the wrong direction. 

Even more problematic for a strengthening- 
based account of conditioning, Gole et al. 
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(1995) conducted another version of the 
experiment in which they reversed the order 
of the conditions. Thus, the rats experienced a 
sensory preconditioning phase in which a 
neutral tone was followed by the clicking 
stimulus without the US and then exposed to 
the delay- or trace-conditioning phases involv- 
ing the US. This reversal of the order of phases 
did not affect the results. Again, a traditional 
transfer-of-value or strengthening account is 
strained, to say the least, by the effectiveness of 
the backward sensory preconditioning for the 
group subsequently exposed to trace condi- 
tioning. Although Cole et al. note that a more 
traditional strengthening-based account of the 
results of their experiments is not impossible, 
such an account is fairly convoluted and raises 
concerns about plausibility and parsimony. 
Nonetheless, as Gallistel and Gibbon (2002) 
explain, the result does make sense if one 
assumes that the animals learned about the 
temporal structure of the sequence of events 
across the conditions. 

Interestingly, even in the absence of the 
Cole et al. (1995) data, Gallistel and Gibbon 
(2002) note that the simple phenomenon of 
sensory preconditioning is damaging enough 
in its own right to the traditional approach. If 
reinforcement-like strengthening is required 
for conditioning, then learning should not 
occur when two neutral stimuli are paired in 
the absence of a US (i.e., sensory precondi- 
tioning should not occur) . They argue that the 
phenomenon has been largely ignored in 
terms of its implication for conditioning 
theory and has instead been given a name to 
suggest that it is somehow different from 
“real” conditioning (i.e., “sensory” and 
“pre” conditioning). From their perspective, 
sensory preconditioning is just another in- 
stance of organisms learning about the struc- 
ture of the environment and what events 
predict about where and when other events 
occur. From such a timing-based comparator 
perspective, reinforcement is important for 
performance, not learning. 

Although the most challenging examples 
presented by Gallistel and Gibbon (2002) 
come from Pavlovian preparations, they argue 
that the processes are fundamentally the same 
for Pavlovian and operant conditioning. In 
addition, their argument makes use of a large 
amount of data from operant timing proce- 
dures that also seem to challenge the notion 


that reinforcers strengthen behavior. Al- 
though the challenges raised by Gallistel and 
Gibbon have largely been ignored by those of 
us who study operant conditioning, Davison 
and Baum (2006) have nevertheless similarly 
suggested that primary reinforcers may not 
strengthen behavior. Davison and Baum made 
this suggestion based upon the signaling 
effects of conditioned reinforcers discussed 
above, and upon related findings with primary 
reinforcers obtained by Krageloh, Davison, 
and Elliffe (2005). Thus, Davison and Baum 
suggest that: 

“The most general principle, rather than a 
strengthening and weakening by consequenc- 
es, may be that whatever events predict 
phylogenetically important (i.e., fitness-en- 
hancing or fitness-reducing) events, such as 
food and pain, will guide behavior into 
activities that produce fitness-enhancing events 
and into achvities that prevent the fitness- 
reducing events.” 

Although 1 suspect that Gallistel and Gibbon 
and Davison and Baum would disagree about 
the details of how to construct such a 
framework, 1 hope that the basic similarity in 
their approaches is obvious — organisms learn 
the predictive relations between events in the 
environment and are guided by what is 
learned. Davison and Baum suggested that 
such guidance of behavior by what are 
traditionally thought of as reinforcers and 
punishers might result from such events 
serving as discriminative stimuli for future 
occurrences of those events. In my view, the 
traditional notion of a discriminative stimulus 
as a stimulus that sets the occasion for a 
reinforced response retains little utility if a 
“reinforcer” is considered only to be discrim- 
inative for future occurrences of itself. Above, I 
noted that many names including “discrimi- 
native stimulus” might be used for stimuli 
usually referred to as conditioned reinforcers 
when they are instead considered to guide 
rather than to strengthen behavior. My pref- 
erence for the terms “signpost” and “means- 
to-an-end” is based on this more general 
viewpoint that the term “discriminative stim- 
ulus” makes little sense if both primary and 
conditioned reinforcers serve to guide rather 
than to strengthen behavior. 

Terminological issues notwithstanding, the 
available data also leave ample room for one’s 
general perspective to determine how the 
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effects of primary reinforcers are to be 
interpreted. Nonetheless, the challenges 
raised by Gallistel and Gibbon (2002) and 
Davison and Baum (2006) about how primary 
reinforcers have their effects are certainly 
worthy of serious consideration. However, 
even if one dismisses such suggestions and 
continues to helieve that primary reinforcers 
strengthen operant behavior, the challenges 
raised by the data from Pavlovian procedures 
reviewed hy Gallistel and Gihhon would 
continue to pose problems for the notion that 
the strengthening effects of primary reinforc- 
ers are transferred to conditioned reinforcers. 
The reason is that such a transfer of the ability 
to strengthen behavior would continue to 
require Pavlovian conditioning. Although we 
are pleased to have the effects of stimuli on 
operant behavior grounded in widely accepted 
models of Pavlovian conditioning, in reality, 
relatively little serious attention has been paid 
to the potential implications of recent devel- 
opments in the Pavlovian literature. 

Implications for Choice Theories and Behavioral 
Momentum Theory 

If one were to accept that primary and/or 
conditioned reinforcers do not strengthen 
behavior that produces them, what would be 
the implications for the general choice theo- 
ries discussed earlier? Obviously, choice theo- 
ries like GGM and HVA are based upon the 
value or strengthening effects of conditioned 
reinforcers. However, discarding the notion 
that conditioned and primary reinforcers 
strengthen behavior would likely have little 
real impact on the theories. The reason is that 
each of the theories quantitatively describes 
systematic changes in behavior with changes in 
the structure of the environment. The notion 
of value rests in the conceptual interpretation 
of the relations captured by the quantitative 
machinery of the models, and value need not 
carry any connotation of reinforcement-like 
response strengthening for the models to be 
useful. Such an approach would be consistent 
with the more general response-strength free 
behavioral economic/regulatory or consump- 
tion-based approaches within which the mod- 
els might be subsumed (e.g., Rachlin, Green, 
Kagel, & Battalio, 1976; Rachlin, Battalio, 
Kagel, & Green, 1981). Further development 
of choice theories like GGM and HVA based 
on conditioned reinforcers as signposts or 


means to an end might be helpful for further 
incorporating both tokens and stimulus-based 
guidance of behavior into a broader economic 
or regulatory framework. 

The implications of dropping the notion of 
response strength are somewhat more compli- 
cated when considering behavioral momen- 
tum theory. If reinforcers do not strengthen 
behavior, then what is it that behavioral 
momentum theory captures? Baum (2002) 
has suggested that the effects of different rates 
of reinforcement in different stimulus con- 
texts might be thought of as the stimuli 
enjoining (i.e., instructing, directing, urging) 
different allocations of behavior during base- 
line conditions. During disruption, a stimulus 
associated with a higher rate of reinforcement 
might be thought of as enjoining that alloca- 
tion of behavior more persistently. Although 
this enjoining approach resembles a signposts 
account on its surface, Baum’s suggestion that 
enjoining itself might be strengthened by 
reinforcement appears to be similar enough 
to the Pavlovian-strengthening account of 
behavioral momentum theory as to be difficult 
to tell the difference between the two, until it 
is further developed. Nonetheless, Baum’s 
general approach of focusing on the guiding 
or directive effects of stimuli signaling differ- 
ent rates of primary reinforcers, rather than on 
the putative strengthening effects of reinforc- 
ers in a stimulus context, seems like a 
reasonable way to proceed. There is no reason 
in principle that an enjoining function of 
stimuli be based on strengthened by reinforce- 
ment. 

Alternatively, Gallistel and Gibbon (2002) 
suggested a timing-based account of resistance 
to extinction based on a comparison of the 
time since the occurrence of the last reinforcer 
in the presence of a stimulus to the overall 
average time between reinforcers in the 
stimulus (i.e., rate estimation theory). Al- 
though rate estimation theory apparently does 
a good job with many findings in the Pavlovian 
literature, an explicit test of its predictions in a 
typical behavioral-momentum type experi- 
ment with multiple schedules of reinforce- 
ment has questioned its applicability to behav- 
ioral momentum theory (Nevin & Grace, 
2005) . In addition, at present, rate estimation 
theory appears to be limited to disruption by 
extinction and does not provide a broader 
framework within which to understand the 
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effects of differential reinforcement rate on 
resistance to other types of disruption (e.g., 
satiation). Nonetheless, the failures of the 
particular version of a timing-based account 
of resistance to extinction provided by rate 
estimation theory do not mean that another 
such account based on different assumptions 
cannot provide a viable alternative to the 
reinforcement-strengthening account of be- 
havioral momentum theory. Regardless, any 
such account would still need to address the 
fact that variations in parameters of primary 
reinforcement impact resistance to change, 
but parameters of conditioned reinforcement 
appear not to. 

In conclusion, perhaps it goes without 
saying that considerably more research needs 
to be conducted on the effects of parameters 
of conditioned reinforcement on choice and 
resistance to change. Ideally, such work would 
come from a variety of procedures to study 
conditioned reinforcement. Given the relation 
between the signpost or means-to-an end 
account and token reinforcement, examina- 
tions of how variations in parameters of token 
reinforcement affect choice and resistance to 
change might be especially helpful. Another 
potentially fruitful question would be whether 
preparations like those in Cole et al. (1995) 
discussed above could be extended from fear 
conditioning to the generation of positive 
conditioned reinforcers for operant behavior. 
If they could, the results might pose a serious 
challenge to the notion that conditioned 
reinforcers have their effects via an acquired 
capacity to strengthen behavior that produces 
them. Hopefully, such research would also be 
accompanied by further exploration of wheth- 
er primary reinforcers themselves impact 
behavior via a response-strengthening process. 
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